One-Sided Prototype Selection on Class Imbalanced Dissimilarity Matrices
نویسندگان
چکیده
In the dissimilarity representation paradigm, several prototype selection methods have been used to cope with the topic of how to select a small representation set for generating a low-dimensional dissimilarity space. In addition, these methods have also been used to reduce the size of the dissimilarity matrix. However, these approaches assume a relatively balanced class distribution, which is grossly violated in many real-life problems. Often, the ratios of prior probabilities between classes are extremely skewed. In this paper, we study the use of renowned prototype selection methods adapted to the case of learning from an imbalanced dissimilarity matrix. More specifically, we propose the use of these methods to under-sample the majority class in the dissimilarity space. The experimental results demonstrate that the one-sided selection strategy performs better than the classical prototype selection methods applied over all classes.
منابع مشابه
Prototype Selection in Imbalanced Data for Dissimilarity Representation - A Preliminary Study
In classification problems, the dissimilarity representation has shown to be more robust than using the feature space. In order to build the dissimilarity space, a representation set of r objects is used. Several methods have been proposed for the selection of a suitable representation set that maximizes the classification performance. A recurring and crucial challenge in pattern recognition an...
متن کاملA Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کاملAddressing the Curse of Imbalanced Training Sets: One-Sided Selection
Adding examples of the majority class to the training set can have a detrimental eeect on the learner's behavior: noisy or otherwise unreliable examples from the majority class can overwhelm the minority class. The paper discusses criteria to evaluate the utility of clas-siiers induced from such imbalanced training sets, gives explanation of the poor behavior of some learners under these circum...
متن کاملInvestigation of Term Weighting Schemes in Classification of Imbalanced Texts
Class imbalance problem in data, plays a critical role in use of machine learning methods for text classification since feature selection methods expect homogeneous distribution as well as machine learning methods. This study investigates two different kinds of feature selection metrics (one-sided and two-sided) as a global component of term weighting schemes (called as tffs) in scenarios where...
متن کاملA PRELIMINARY STUDY ON SELECTING THE OPTIMAL CUT POINTS IN DISCRETIZATION BY EVOLUTIONARY ALGORITHMS Salvador García, Victoria López, Julián Luengo, Cristóbal J. Carmona and Francisco Herrera 211 OPTIMIZED ALGORITHM FOR LEARNING BAYESIAN NETWORK SUPER-STRUCTURES
In classification problems, the dissimilarity representation has shown to be more robust than using the feature space. In order to build the dissimilarity space, a representation set of r objects is used. Several methods have been proposed for the selection of a suitable representation set that maximizes the classification performance. A recurring and crucial challenge in pattern recognition an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012